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Abstract 



It is shown that the Hall, Hu and Marron [Hall, P., Hu, T., and Marron J.S. (1995), Improved 
Variable Window Kernel Estimates of Probability Densities, Annals of Statistics, 23, 1- 
l/~) , 10] modification of Abramsons [Abramson, I. (1982), On Bandwidth Variation in Kernel 

Estimates A Square-root Law, Annals of Statistics, 10, 1217-1223] variable bandwidth 
kernel density estimator satisfies the optimal asymptotic properties for estimating densities 
with four uniformly continuous derivatives, uniformly on bounded sets where the preliminary 
estimator of the density is bounded away from zero. 
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1 Introduction and statement of the main result 
> 

Let / be a density on the real line and let Xi, i £ N, be independent, identically distributed 
random variables with distribution of density /. Abramson (1982) discovered that if in the usual 
kernel density estimator one allows the bandwidth h n to vary with the data according to the 
Nl . . 'square root law', that is, if one takes 

o 

/»(*) = l - E f - L 1 r^ K ( iz^i^m) (i) 

n . n n y n n J 

instead of the classical estimators with the same sequence h n — > 0, where ft(x) — /(x)V(/(£)/10), 
then a bias reduction phenomenon occurs. This has been used by Hall, Hu and Marron (1995) 
(following Hall and Marron (1988), corr. (1992)), McKay (1993) and Novak (1999), to propose 
density estimators which are non-negative at all points and which estimate f(t) at any given 
t at the L2-norm loss minimax rate of n~ 4 ' 9 if the density / is four times differentiable with 
continuous and bounded derivatives. 

Of course, the expression ([1]) is not an estimator of / as it depends on the unknown / through 
ft, but it becomes one if / is replaced by a preliminary estimator (based on the same data, or on 
an independent set of data). As in the mentioned papers, expressions such as (JTJ) will be referred 
to here as 'ideal' estimators. 
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It was once believed that f t in |T]) could be replaced by /, but Terrell and Scott (1992) showed 
that in this case the bias reduction at a single t depends heavily on the tail of / and becomes 
negligible in the normal case (see also Hall, Hu and Marron (1995) and McKay (1993)). Taking 
ft instead of / as Abramson did constitutes a way to deal with the tail effects on the localities t. 
Hall, Hu and Marron (1995), McKay (1993) and Novak (1999) also devised other ways of dealing 
with the problem. In particular, Hall, Hu and Marron proposed the ideal estimator 

f»® = \T, K (^X^/ 1/2 (*0) f 1/7 {Xi)JQt-Xi\ < h n B), (2) 

for some B > 0. Novak replaces h n in the indicator by h n / f x / 2 (t) and considers powers other 
than 1/2 as well, and McKay replaces f t (x) in (JTJ) by a smooth function a(x) = cv 1 / 2 {f(x)/c 2 ) 
with v(t) — t for all t > to > 1 with the first four derivatives of v vanishing at zero. We will focus 
our attention only on the simplest of these ideal estimators, which is (|2"|). although our results 
should hold for the other versions as well. The ideal estimator will only be a means to study the 
'true' estimator, obtained from the ideal by replacement of / by a preliminary estimator. 

Specifically, in this article we study the uniform approximation of a density / by estimators 
of the form 

1 " ft — X \ 

f(t; hi, n , h 2 , n ) = — — V K — — -f 1/2 (X t ; hi, n ) f 1/2 (X i; hi, n )I(\t - Xi\ < h 2 , n B), (3) 
nh 2 ,n r— i \ h 2 ,„ J 

where /(x; /ii,n) is the classical kernel density estimator 

1 ^„(x-Xi 



KvM,n) = -r-Y. K \- 



' i— i x 

and hi >n are two sequences of bandwidths that tend to zero as n — > oo. Ideally, we would like 
to prove results for \\f(t;hi jn , h 2jU ) — /(t)||oo, however controlling the bias part of this error, 
\Ef(t; hi jn ,h 2t n) — f(i)\, seems to require that /(£) be bounded away from zero, so, we will 
consider instead the suprcmum of the estimation error on the 'ideal' regions 

D r = D r (f):={t:f(t)>r,\t\<l/r}, r > 0, (4) 

and will eventually replace D r by a region that depends on the data only and that can be made 
arbitrarily close to the positivity set of /. (We will not display the argument / in D r (f) unless 
confusion is possible.) It is known (Hall, Hu and Marron (1995), Novak (1999)) that the bias 
reduction does hold for / and K four times differentiable and that then one has a bias of the order 
of h\ n . This leads almost immediately in the case of the ideal estimator, and with some relatively 

hard work in the case of the real estimator, to an a.s. rate of convergence of f(t;hi in , h 2j n) — f(t) 
(fixed t) of the order of n -4 / 9 if we take h 2>n ~ n -1 / 9 and hi >n ~ n~ 2 / 9 or of a smaller order, and 
this is best possible (n~ 4 / 9 is the minimax rate in the L 2 {P) norm for estimating f(t) four times 
differentiable with continuity). The minimax rate for the sup norm in this case is (n/logn)~ 4 ' 9 
and we show in this article that this rate is achieved by the estimator ([3]) uniformly in D r and 
in a similar data-dependent region. (See e.g. Efromovich (1999) for minimax rates.) Concretely, 
we prove the following theorem, in fact, as explained below, a uniform version of it. 

Theorem 1 Assume the density f and its first four derivatives are uniformly continuous and 
bounded, that the same is true for the kernel K , which, moreover is non-negative, has support con- 
tained in [—T,T], T < oo, integrates to 1 and is symmetric about zero. Set h 2 , n = ((logn)/^.) 1 ' 9 



and hi <n — n 2 ' 9 (or hi <n = n ( 2 +'')/ 9 for some < r\ < 1), n G N. Then, for all r > and 
constant B > T/r 1 ' 2 in the definition of f(t; hi >n , h% jn ) in |3J) 7 we have 

4/9\ 



sup 

t£D r 



If £>" is defined as 
then we also have 



f(t;hi, n ,h 2 ,n) -/(*) 



D n = 



Oa 



logn 



?={t:/(t;/il in )>2r,|t|<l/r} 



sup 

4S-D" 



/(t;/»i 



/(*) 



Oa 



logn 



4/9N 



(5) 



(6) 



(7) 



(Actually, fi2, n needs only be asymptotically of the order ((logn)/™) 1 / 9 in the sense that 



< liminiV, 



u79 < limsup r? 



h 2 , n 



< oo, and the same comment applies to hi n , 



((log n)/n)V» - """^n ((logrO/n) 1 ^ 

but for simplicity we will work with exact values.) 

We note that, by the zero-one law, statement (JSJ is equivalent to the existence of a finite 
constant C such that 



lim sup 



logn 



4/9 



sup 

t£D r 



f(t;hi, n ,h2, n ) -/(*) 



= C a.s. 



(8) 



and likewise for (J7J . And © holds for some C < oo if and only if there is C < oo such that 



lim Pr < sup 



n>k 



logn 



4/9 



sup 

t£D r 



f(t;hi, n ,h2,n)- /(*) 



> C" ^ = 0. 



So, the following definition is justified (it is similar to the definition of uniform Glivenko-Cantelli 
classes of functions in Dudley, Gine and Zinn (1991)): 

Definition 1 For each n G N, let Z n {x\, . . . , x n ; f) be functions of n real variables x±, . . . , x n 
and of the density f , f G T> , where V is a collection of densities. We say that the collection of 
random variables Z n {X\, . . . , X n , /), / £ D, n £ N, is a.s. asymptotically of the order of a n 
uniformly in f 6D, 

Z n (Xi,..., X n , f) = O a . s . (a„) uniformly in / € T>, 

if there exists C < oo such that 



lim sup Pr/ (sup — |Z„(X 1; . . .,X n ,f)\ > c\ = 0, 
irf o a .s. («ra) uniformly in f ED if the limit (0) ZioMs /or every C > 0. 



(9) 



For < C < oo and no n- negative function z such that z(o") \ as S \ 0, define the class of 
densities 



V 



C,z 



/ : / is a density, ||/ 



(fc) 



< C, < fc < 4, 



and sup 



teR 

\u\<6 



f(4) 



/ w (* + «)-r ; (*) < z ( 5 )> o<5<i 



(10) 



Here is the stronger version of Theorem Q] that we prove in this article. 



Theorem 2 Under the hypotheses of Theorem]]] we have 



sup 
teD r (f) 



sup 



( f\ \ 4 ^ 9 \ 
/(*; hi, n , h 2 , n ) - f(t) = O a .s. — uniformly in / € V c , z (11) 



-1 \ 4/9 N 

logn x 



/(*; &i,», fta,n) " /(*) = O a .s. -2- uniformly in / e P c , 2 (12) 



/or aH < C < cxi and function z > suc/i i/iai ^(/i) \, as h \, 0. 

It is natural that, as shown by Hall, Hu and Marron (1995), the estimator Q be locally (that 
is, at each point t) asymptotically better than the classical kernel estimator that it modifies 
because, after all, it is obtained from the classical one by local or spatial adaptation of the 
bandwidth. This theorem shows that, up to a logarithmic factor, the improvement is not only 
local but holds uniformly over all t for which f(t) is slightly above zero, and uniformly as well 
over large classes of densities with four continuous derivatives. This may seem surprising and is 
certainly desirable. See the comments by Donoho, Johnstone, Kerkyacharian and Picard (1995) 
about the scarcity of theoretical results on 'spatially adaptive' estimators. 

We do not know of any other non-negative estimators of a density that achieve such good 
rates in sup-norm loss (although Abramson's or Novak's may). Thresholding wavelet density 
estimators (Donoho, Johnstone, Kerkyacharian and Picard (1996)) constitutes also a kind of 
adaptation to the local behavior of / since wavelets pick up local behavior; these estimators 
may not be non-negative on the whole domain, but are rate adaptive to the smoothness of / in 
sup-norm loss, in particular satisfying Theorem [2] -but also attaining the rate ((logn)/nY'^ 2t+1 ' 
uniformly on densities in the unit ball of C*(R) (Gine and Nickl (2008)). See also Gine and 
Nickl (2009) for estimators with this property based on convolution kernels of higher order and 
Lepski's method. 

We first prove Theorem[5]for the ideal estimator and then show that the supremum over D r of 
the difference between the true and the ideal estimators is of the order of ((log n)/n) ' . For this 
we use empirical process and U-process techniques: basically, the classes of functions involved in 
the supremum in (J5J) and in other suprema appearing in the proofs are of Vapnik-Cervonenkis 
type (see e.g. de la Pena and Gine (1999)) and therefore we can use the appropriate version of 
Talagrand's exponential inequality for empirical processes (as in Einmahl and Mason (2000) and 
Gine and Guillou (2002)), and an inequality due to Major (2006) for {/-processes. We relegate 
to an appendix proving that the relevant classes of functions are of VC type, so that we get this 
technicality out of the way in the main proofs. 

Since we use empirical processes, in order to avoid measurability problems and without loss of 
generality, we assume throughout that the variables Xi are the coordinate functions on J7 = R N , 
equipped with the product er-algebra and the probability measure Pr = P N , dP{x) — f(x)dx, 
that we will denote as Pr^- if (and only if) we need to distinguish among several densities. 

2 The ideal estimator 

In this section we obtain the asymptotic size of the uniform deviation of the ideal estimator @ 
from the density /, that is, we will consider the a.s. asymptotic size of 

sup \f(t; h n ) ~ f{t)\ := \\f(t; h n ) - f{t)\\ Dr 

teD r 



As usual this quantity is divided into the bias part, \\Ef(t; h n ) — /(t)||.D r , and the stochastic part 
or variance part \\f(t; h n ) — Ef(t; h n )\\D r - Each is studied in a different subsection. There is no 
problem with extending the supremum for the variance part over the whole of R; the problem 
is, as mentioned above, with the bias. 
We will use the shorthand notations 

f n (t;h) = f n (t)=f(t;h n ) 

so that we display only either h n or n but not both; the first expression is used in this section 
and the second in the next. 

2.1 Stochastic part of the 'ideal' estimator 

In this subsection we assume: 

Assumptions 1 The sequence h n will satisfy the following classical conditions: 



h n \ 0, 



I log h„ 



— > 00, 



I log fen I 

log log n 



— > 00, and nh n /*, 



(13) 



ttsn->oo. The kernel K will be a non-negative left or right continuous function, bounded, with 
support contained in [— T, T] for some T < oo, and of bounded variation, f is a bounded density. 

The proof of the following proposition is patterned after the proof of a similar theorem in 
Gine and Guillou (2002), and it consists of blocking and application of Talagrand's inequality 
(|60p . It extends to the variable bandwidth estimator a well known uniform rate for the usual 
kernel estimator (Silverman (1978), formula (9)). 

Proposition 1 Under the hypotheses in Assumptions]]^ 



||/n — EfnWoo = O a . s . 



dog fen 1 

nh n 



uniformly over all densities f such that ||/||oo < C, for any < C < oo. 

Proof. We block the terms between dyadic integers as follows, where, for ease of notation we 
set l i>n (t) := I(\t - Xi\ < h„B) and l ih (t) = I(\t - X t \ < hB): 



Pr 



nh n 



2 fc - 1 <n<2<= U log/l 

< Pr 



ill/™ — Ej n \\cx> > A 



sup 



2fc-i<n<2* y 2 k ~ 1 h 2 k log/i'fc 1 te 



E 



K 



t-Xj 

h n 



fV\Xi) ) fV 2 (X t )l i>n (t) 



EK \ t -J^f 1 ^(X i ))f/ 2 (X i )l i>n (t) 



h„ 



> A 



< Pr< max sup 

2 k ~ 1 <n<2 k ten 

y h k <h<h k _ 1 



E 



i< l L-ZifV^Xi) ) /^(xoMt)- 



E:K[^-^f 1 / 2 (X i ))f^(X i )l th (t) 



>X,2^h 2k \ogh- k 1 \(U) 



for any A > 0, where we used that h n decreases and that the function x log x x is decreasing for 
x < 1/e. As we see in the Appendix the class of functions 



T 



{ A (L-/ 1 ^.)^ /V2 ( . )/( | f _ .| < hB) . t e R) h > | (15) 



is a bounded VC class of measurable functions with respect to the constant envelope W :- 

1 li 
Pm V In oo , where ||A||y is the total variation norm of A. Hence, the subclasses 



( L 1 ^f 1/ %))f 1/2 (-m--\<hB):te 



F k = {K I —t /2 (-) ) f /2 (-)I(\t --\<hB):teR,h 2k <h< hak-i J (16) 

are VC classes of functions with respect to Uk = W also and with the same characteristics A{v) 
and v as T . Next, in order to apply Talagrand's inequality (|60p . we obtain a sensible bound <j\ 
for the maximum variance of the functions in Tk : 

\JK* (^/ 1/2 (*)) W ~ A < hB)f(x)dx <IJ m K 2 (^/ 1/2 (*)) f(*)dx 

= [ K 2 (uf 1/2 (t - huj) f{t - hu)du 
Jr \ ' 

< f (\\K\a\fWi,) A(\\K\\UT/\u\r)du 
Jr 

L / du+ - ) du 

Jo JT/\\f\\U 2 \ u 



2IIA1 



= ^11^11^11/11^. (17) 

So, we can take a 2 := jTWKW^WfW^ 2h 2 k (using the fourth condition in (JT3])) . Uk = W is 
eventually much larger than Ok and 



V¥aJlog^«2 k a 2 

V CTfe 

by the second condition in (fT5)l (here and elsewhere, the sign << should be read as 'of smaller 
order than' when the indexing variable, in this case fc, tends to infinity). If A in (|14p is taken to 
be large enough so that 



RU k 



CiV2*<Wlog < A J2 k -^h 2k log ft", 1 « 2 k a 2 , (18) 

V &k 

where C\ is one of the constants in Talagrand's inequality (1601) . then, this inequality applied to 
the inequalities (Q3J), gives 



Pri max J- -r /„-£/„ »> A < C 2 exp —f — . 19 



Set A = Ly/T\\ AHooC 3 / 4 . Then we can choose L large enough such that inequality ([T8"f is 
satisfied for all k > fco, fco depending on A only, and for this A inequality ([TT)| becomes 



n/i„ Mj r^j ii ^\ ^ ^ I 3C 3 L 2 h 2 k log h 2 k 



sup Pr< max J- — r||/„ - -E/ n ||<x> > A > < C 2 exp - 

/:||/IU<C |2 fc " 1 <n<2'= V log/in 1 j \ 2 5 



where the term at the right hand side is the general term of a convergent series because 
(logh~k)/logk — > oo by the third inequality in (fT3)) . This proves the proposition. ■ 

This result, which is good enough for our purposes, can possibly be made more precise for 
each particular density /: for instance, Sang (2008) proves 



lim 1 / r ^ T ||/n- J B/„||oo = ||^|| 2 ||/||^ 4 

' % " log /l n 



n—>oo 



if the ideal Hall, Hu, Marron estimator is replaced by the ideal Novak estimator with a = 1/2, 
and under some additional, natural assumptions. This suggests that the rate in Proposition 1 is 
optimal. Also, Theorem [1] admits more general and stronger versions: see Mason and Swanepoel 
(2008) for a recent result along the lines of the previous theorem, with uniformity in bandwidth 
added, and for a general class of estimators that includes ours. 

2.2 Bias of the 'ideal' estimator 

The assumptions on. /, K and h n in this section are as follows: 

Assumptions 2 We assume that the densities f and the kernel K as well as their first four 
derivatives are bounded and uniformly continuous, and moreover that K has support contained 
in [—T,T], T < oo, it integrates to 1 and is symmetric about zero. We also assume h„ — > as 
n — > oo (and h n > 0). 



We set 



f(t;h) := Ef n (t;h) = ~J f/ 2 (x)K f^f^ 2 (x)] I(\x - t\ < Bh)dx 

B i-B 



f 6/2 (t + hw)K(wf i/2 {t + hw))dw= g tyW {hw)dw, (20) 

-B J -B 

where, for t and w fixed, 

0t,«(u) = f /2 (t + u)K(wf l / 2 {t + u)). (21) 

If no confusion may arise, we drop the subindices t, w from g. To estimate the bias of the ideal 
estimator, f(t;h) — f{t), one develops g(hw) about zero and integrates. For further reference, 
we record the first four derivatives of g(u): by direct computation or e.g. from Novak (1999), we 
have, with r(u) — f 3 ^ 2 (t + u) and s(u) — w/ 1 / 2 (t + u), 

g{u) = r{u)K(s(u)), g'{u) = r'{u)K{s{u)) + r(u)s'(u)K'(s(u)), 

and, dropping the arguments for simplicity, 

g" = r"K + (2r's' + rs")K' + r{s') 2 K", 

g>" = r '"K + (3rV + 3rV + rs'")K' + 3{r'(s') 2 + rs's")K" + r(s') 3 K'" 
g W = r^K + (Ar'"s' + 6r"s" + ir's"' + rs^)K' + {6r"{s') 2 + Ur's's" + Ars's"' 

+ 3r{s") 2 )K" + (4r'(s') 3 + Qr{s') 2 s")K'" + r{s') 4 K {i \ (22) 



Proposition 2 Under the hypotheses in Assumptions^ if the constant B in the definition of 
f n (t; h n ) satisfies B > T/r 1 ' 2 , then, for all < C < oo and functions z > with z(h) \ as 
h \t 0, we have 

f(t;h n )-f(t) 



lim sup sup 



H 



-H(t,f,K) 



= 



and 

where 

H(tJ,K) 



sup sup |-ff(4,/, if)| < oo, 



(fT(t) 3(/') 2 (i)/"(i) , 4/'(i)/'"(i)+3(/") 2 W /W(<) 



(23) 

(24) 



/ 5 W 



2/*(t) 



12/3(t) 



24/2(t) 



v K(v)dv. 



Proof. Since / and K and their first four derivatives are continuous and / > r/2 on a neigh- 
borhood of D r , it follows that, if gt w is as defined in (|2"Tj) . there exists no < oo such that, for all 
t e D r and for all w € R, gj.^ is continuous on [— Bh n , Bh n ] for all n > no: note that g( 4 \u) 
is a linear combination of K and its first four derivatives at wf x l 2 {t + u) whose coefficients are 
fractions that have products of powers of w and powers of f(t + u) and its derivatives in the 
numerator, and powers of f(t + u) in the denominator (see([22])). Therefore, Taylor expansion 
gives 

g(u) = E3 W (0)| + ^E T gW(ru) (25) 

fe=0 

where r is a random variable with density X(x) = 4(1 — a;) 3 , < X < 1, that does not depend on 
i, w or u, and i? T denotes expectation with respect to this variable. Equation (|23|) can be easily 
verified by integration by parts in J Q 4(1 — t) 3 g^(tu)dt. Next note that 



gt, w (0)dw 



B rBf^ 2 (t) 

f'^Kiwf^iWw = / f{t)K(v)dv = f(t) 

B J-Bf 1 / 2 ^) 



(26) 



since the support of K is contained in [— Bf x / 2 {t), i?/ 1 / 2 (t)] by the hypothesis on B and since 
K integrates to 1. Further, since s' contains a w factor, there are functions Ci(f,t), i = 1, 2, such 
that 

r B rBf^ 2 (t) 

wg' tw (0)dw= / (c 1 (f,t)vK(v)+c 2 (f,t)v 2 K'(v))dv = 

-B J-Bf 1 / 2 ^) 

because K is even and K' is odd. Similarly (that is, using only the symmetry properties of K 
and its derivatives), we also get J_ B w 3 g' t " w (0)dw = 0. That these two integrals vanish is obvious 

and not surprising; what is remarkable is that also J_ B w 2 g' t ' w (0)dw = 0, and this fact is the 
main reason for the bias reduction achieved by Abramson's (1982) 'inverse square root rule'. 
We sketch an argument for completeness. Note first that, from the expression for g" in (f2"2")l . 
integrating by parts, 



[r(s') 2 K"(s)}(0)dw = \f 1/2 (W) 2 (t) I w 4 K"(wf^ 2 (t))dw 



= -(f') 2 (t) / w 3 K'(wf^ 2 (t))dw. 



Collecting terms in K' , this gives 



w 2 g" (0)dw 



r 1/2 (W) 2 (t) + h 1/2 W"(t) 



\{f') 2 (t) + \f{t)f"{t) 



w 2 K(wf 1/2 {t))dw 
w 3 K'(wf 1 ^ 2 (t))dw, 



and, integrating by parts the second integral, we get zero. [See Novak (1999) for a proof that, 
if one replaces f 1 / 2 by f a (and / 3//2 by f a+1 ) in the definition of /„(£; h) the only a for which 
f_ B w 2 g"(0)dw — for all / twice differentiable with /(£) 7^ is a — 1/2.]. Thus, we have 



/ w 2 g[ l) w {Q)dw = for i = 1, 2, 3, 

J-B 



and we conclude, from this, (|20p . (f2"5l and ([2"B)l , that 



/(*;/*) 



-B 



gt. w {hw)dw — /(£) + — / u> E T g^ ' {rhw)dw. 
4! j-b 



(27) 



Using the formula for g( 4 ) in (|2"2"|) . integrating by parts and collecting terms, it is tedious but 
straightforward to check that 



w 4 g^l(0)dw 



24(f) 4 (i) 36(f) WW , 8/'(t)/'"(t) + 6(/") 2 W f^(t) 



f 5 (t) 



IHt) 



P(t) 



P{t) 



(28) 



v K(v)dv, 



and to note that 



sup sup 

fET>c,zt£D r 



w 4 g[%{Q)dw 



-B 



< 00. 



(29) 



Now, the boundedness and uniform continuity of K and its four derivatives and the facts that, 
for / S £>c,zi / and its first three derivatives are Lipschitz with common constant C and the 
fourth derivatives f^ have all the same modulus of continuity z at all t, and that / is bounded 
away from zero in a neighborhood of D r , imply that 



lim sup sup sup snp \gl 4 l(rh n w) - gll{0)\ = 0. 

ra->oo feVc z <T<1 we[-B,B\ t£D r 



(30) 



Therefore, 



lim sup sup 



J 4 ) 



W, 



w E T (g\J u (Th n w) - gll(0))dw 



= 



and we have from this and (|2"Tj) that 



sup sup 
/e£> c , 2 tez> r 



ft-«(/(*5 M - /(*)) - ii I ™ 4 9l%(0)dw 



4! 



-B 



sup sup 

f£T> c ,zteD r 



> 



w\g[%{Th n w)-g[%mdw 



^0 



as n — > oo. This, together with (|28l) and (l2l)l) prove the proposition. ■ 

This proposition is similar to Theorem 3.1 of Hall, Hu and Marron (1995) and to Theorem 1 
of Novak (1999), who do not consider uniformity in t or /, and our proof is somewhat adapted 
from the latter reference (which deals with a slightly different estimator). See also Hall (1990), 
Terrell and Scott (1992) and McKay (1993). 

Combining Propositions [T] and [2] we obtain the following result for the 'ideal' estimator. 

Theorem 3 Under the Assumptions [H and with h n = ((logn)/n) 1 ' 9 , we have, for every < 
C < oo and function Z such that z(h) \, as h ~\ 0, for all r > and constant B > T/r 1 ' 2 in 
the definition of f n (t;h) in (fj)) ; 

sup \f(t; h n ) - f(t)\ = O a . s . f 0^\ J uniformly in / G V C ,z- (31) 



Remark 1 The limit (1301) is straightforward, but lengthy to compute. By way of illustration 
we indicate how to prove a 'small piece' of it. Let us consider, for example, the term in f( 4 > 
from the first summand A 4 ^ K in the expression for g^ in (|2"2"j) . It is (3/2)/ 1 / 2 (i + u)f^'(t + 
u)if(w/ 1/2 (i + u))- Th en, 

\f l /\t + «)/W(t + u)K(s{t + u)) f' 2 {t)f^{t)K(s{t))\ < ||/||i/ 2 ||^||oo|/ (4) (t + «) - / (4) (*)l 

+ ||/ (4) ||oo||^||oo|/ 1/2 (t + U) - / 1/2 (i)| + ||/ (4) ||oc||/||^( S (* + «)) - K( S (t))\. 

And we have, for the first summand, 

\f (4) (t + Th n w)-f^(t)\<z(Bh n )^0 
uniformly in t and / (recall \t\ < 1, |iy| < B). For the second summand, for n large enough, 

|/^(t + urh nW ) - f^(t)\ < l/(* + »y -/Wl < ^ _, o, 

and the limit zero for the third follows directly by uniform continuity of K and the common 
Lipschitz constant C for all / € T>c, z - 

3 Comparison between the ideal and the true estimators 

In this section we make the following assumptions on the kernel K, the densities / and the band 
sequences: 

Assumptions 3 We assume that K is supported by [— T, T] for some T < oo and that it has a 
uniformly bounded second derivative. We also assume that the densities f are bounded and have 
at least two bounded derivatives, 

pcf G V c := {/ is a density : ||/ (fc) ||oo < C, < k < 2} (32) 

for some C < oo. We set h\, n = n~ 2 / 9 and hi, n — ((lognj/n) 1 / 9 , neH. 
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Let 

1 ™ / i — X \ 

f(t;hi, n ,h 2 , n ) = -: — V^ -7 — L / 1/2 (^;/n,n) / 1/2 pQ;/ ll ,„)/(|£-X i | < /i 2 ,„B), (33) 

Wl2,n r-f V «2,n / 

where f(x; hi jn ) is the classical kernel density estimator 

/(* ; A lin ) = -i-£>(^y (34) 



H/ii, n f-^ V fa, 



The object of this section consists in proving that 

f(t;hi, n ,h 2 , n ) ~ f(t;h 2 , n ) (35) 



is asymptotically almost surely of the order of «/ (log/i 2 n,)/(^fa,») uniformly in £ on the region 

Z? r defined in Q, for any r > 0, if we take /i 2 ,« = ( (log n)/n) and /ii,n = n-~ 2 ^ 9 . Note 
that h 2jTl is the optimal rate 'up to a log' given the order of the bias, whereas the preliminary 
estimator has a bandwidth sensibly smaller than the optimal n~ 1/>5 (it is less smooth than the 
optimal, 'undersmoothed') and therefore its bias will be negligible with respect to its variance 
term. The main result of this paper will follow from this analysis and the result from the 'ideal' 
estimator. 

We follow the pattern in Hall and Marron (1988) and Hall, Hu and Marron (1995) for the 
linearization of (|35p . with significant differences in order to account for the uniformity in t. For 
instance, they do not necessarily undersmooth the preliminary estimator (whereas we believe 
one should) and, moreover, we are required to use empirical and U-process theory. We adhere 
to their notation as much as possible. 

The first step is to notice that, if we define S n (t) by the equation 

= f^{tM,u)-f 1/2 {t) = Kt-M, n )-f{t) 

f 1/2 (t) (/ 1 / 2 (i;^i,,0 + / 1/2 W)/ 1/2 (i)' 

so that / 1/2 (t; hi, n ) = f 1/2 (t)(l + S n (t)), then we have 

sup 5 n (t) = o a . s .(l) uniformly in / such that ||/||oo < C-, (37) 

teDf. 

where D e r denotes the e-neighborhood of D r for e such that /(£) > r/2 in D e r (/ is uniformly 
continuous). We drop the subindex n from 8 from now on. Set 

D(t; hi, n ) = /(*; hi, n ) - Ef(t; h x>n ) and b(t; h 1>n ) = Ef(t; h hn ) - f(t) 
and note that 



|£>(-:/M.„)|k ' (K,,. \ \l^f^ I <iiiir<>n.ily hi / sucli thai ||/i| v '. C CK) 

for all < C < oo by a result in Deheuvels (2000) and in Gine and Guillou (2002), and that 

||6(-5 fa,n)||oo < ( I K(u)u 2 du) Wf'Wooh 2 „ (39) 
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by the classical bias computation for symmetric kernels. Since the numerator in the expression 
at the right hand side ([55)1 is just D(t) +b(t) and the denominator is not smaller than f(t) which 
is in turn larger than r/2, (J3ZJ) follows from (J3HJ) and (J3T))) . Define 

L 1 (z) = zK'(z) and L(z) = K{z) + zK'(z), zel. 

We then have 

K fi_2^ r /2 {Xi . hln) \ = K (i_2ti f i/2 {Xi) + i_li f i/2 {Xi)s{Xt) \ 

\ "2,n / \ n-2 : n n 2 ,n / 



" ( t-Xj n/2, v -,\ t-Xj tXn , 



+ K'[—^f^(X l )\—^f^{X l )8(X l ) + 5 2 {t,X i ) 

\ Il2,n J Il2,n 

K ^2~ fl ' 2{Xl) ) +Ll (^? /1/2(Xt) ) W +$>(*,*<), 



where 

S^Xi) = K "^ {t ~ h * i)2 ' KXi^Xi), (40) 

£ being a (random) number between t -^f 1 / 2 (X i ) and t -^-f 1 / 2 {X l )+ t -^j 1 / 2 {X. l )S{X i ). Then, 
plugging this development and that of f 1 / 2 in the definition (j3"3")l of /, we obtain 

f(t;h 1 , n ,h 2 , n ) = f(t;h 2 ,„) 

+ -£— V L x (t^Iif/^Xi)) fV^XiWXiW -Xi\< h 2 , n B) 

nh 2 ,n f-f V h 2,n J 

1 n /+- X \ 

+-£— V ^ -: -/ 1/2 (^) /^(j^)^)/^ - Xt\ < h 2 , n B) 

' Z — 1 V ' ' 

1 - 

+ -t—Y / f 1/2 (X i )S 2 (t,X i ))I(\t-X i \ < h 2 , n B) (41) 

' 2 — 1 

i^Vii fV^/ V2 (^)") / 1/2 PQ<5 2 (W(K-^| < fa,**) (42) 

i— 1 x ' ' 

n 

+-7— y2f 1/2 (X i )6(X i )6 2 (t,X i ))I(\t-X i \<h2, n B) (43) 

Wl2,n f-f 
= /(*;ft2,n)+*3(*) 

7?-E L f^ 1 / 172 ^)) / 1/2 (X l )(5(X l )/(|t-X l | < ft 2 , n B), (44) 

/l2,n r-f V /»2,n / 



n/i 2 

where ^(t) is the sum of the terms (|4T|) . (l4"2l and (|4"3")l , which are of a smaller order than the term 
dSl) by (J37J) and (0D]) for t <E D r (as we will readily check). Since by (J3gJ| and (gHJ), £>(y; /ii,„) 
dominates 6(y; /li jn ) uniformly in R, we should further decompose f)44[) to display its Z3-part and 
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its 6-part. By the definitions of 8, D and b, we have 

D(t; /H,„) , 6(t; /n,„) , D(t;h hn ) + b(t;h l<n ) fV 2 (t) - / 1 / 2 (*;fti,„) 



*(*) 



2/(t) ! 2/(t) 2/(i) fi/»( t . hlin) + fi/2( t ) 

D(t;hi tn ) b(t;hi t „) 



2/(t) 2/(t) 



•*4(*), 



(where &t depends on n, but we do not display this dependence) and note that (again using 

( l/2 _ & l/2 = (fl _ 6 )/( l/2 + & l/2) )j 

sup |<J 4 (i)| < — r^ sup [D(t; hi, n ) + b(t; /n.„)] 2 (45) 

teDf, 3r d / 2 tgIJ « 

which is small by (|37[) (note that S n is D + b divided by a quantity which is bounded away from 
zero on D r ) . Setting 

1 n fi-X \ 

El{tM,nM,n) ■= -j— Vi -r -/ 1/2 (*i) /-^(X^X*; fci,„)J(|t " *<| < /te,»B), 

nn-2,™ f-f V "2,n / 

(4 

1 ™ ft - X \ 

e 2 (t,h 1<n ,h 2 , n ) :=__ £)£(-=; -f 1/2 (X l ))r 1 / 2 (X l )b(X l ;h 1 . n )I(\t-X l \ < h 2 , n B), (47) 

' 2—1 V ' ' 



(46) 



and 

1 " ft - X \ 

e 3 (*, fci,n, &2,n) := <J 3 (*)+-t— E £ -7 1 / 1/2 (^ 4 ) / 1/2 (^)<5 4 (X i; Jn, n )l{|t-Xi| < h 2 . n B}, 

nh 2 ,n ~~f V ^2,n / 

(48) 
we obtain (from l|4"I ]) -(|33 j) ). 

/(*; hi, n , h 2 ,n) = f(t; h 2 ,n) + 2 £ i(*) + 2 £2 ^ + £3 ^' ( 4Q ) 

By the comments above, the e 2 and £3 terms will be of smaller order than E\. £\ itself has a 
^/-process structure, and the linear term in its Hocffding decomposition will be the dominant 
term. This is the content of the lemmas that follow. 

Lemma 1 For i — 2,3, 

sup \si(t,hi, n ,h 2 ,n)\ = O a . s .(?i~ 4/9 ) uniformly in f e Vc 

t£D' r 

for all C < 00. 

Proof. We begin with i = 2. Because the function L is of bounded variation and b(t; hi l7l ) 
satisfies inequality (|55|) . it follows (see the Appendix) that the classes of functions 



Qn ■= JQ(ar) = L (j^f 1/2 ( x )) f' l/2 (x)b(x; h hn )I(\t - x\ < h 2 . n B) : t e D, 



(50) 



are of VC type with the same characteristics A and v, for envelopes of the order of M(K, ^)||/"||oo/i 2 „, 
where M depends on r and K only (in particular, through L). If we set 



Qi(t) = L 



(V^ /1/2(Xj) ) f~ 1/2 ( x i)KXiihiM\t-Xi\ < h 2 ,nB) 
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it then follows (by the bound (|39p on b, boundedness of L and boundedness away from zero of / 

on D r ), that 

sup E\Qi{t)\ < \\f"\\ooh 2 ln h 2 . n = UriUn-^aogn) 1 / 9 , 
teD r 

sup EQ\{t) < WrWlhf^h^ < ||/' / ||2 n- 1 (logn) 1 / 9 , sup \Q t (t)\ < ||/"||oo^,„ = ||/"||oo^- 4 / 9 , 

teD r tG-Dr 

where in these bounds we ignore multiplicative constants that do not depend on /. We have 



sup |e 2 (t;/ii,„,/i2. n )l < SU P 
teD r teD r 



< sup 
teD T 



-4—Y,[Qi(t) - EQi(t)] 

' 2—1 
1 " 

n^-X mt) - EQm 

' i— 1 



sup -_|£Qi(t)| 

uriioon- 4 / 9 , 



and Talagrand's inequality ((60)) gives that for < S < 4/9, 



2 sup Pr/ < sup 
„ /6Pc [tec. 



X)K?i(*) - -BQiW] 



> n* > < C 2 J2 ex P 



C 3 n 



2-V 



CPQogn) 1 / 9 



< oo. 



Since n < nh 2 , n n 4 ' 9 , we conclude 



sup |e 2 (t;/ii,„,/i2,n)| = O a . s .(n 4/9 ) uniformly in f e V c 

teD r 

proving the lemma for e 2 . Note that h\ n ~ n~ 4 / 9 plays a critical role in this estimation. 

Next, from (|4"51) we see that £ 3 consists of four sums, the three that define S3 and one involving 
6^ (multiplied by bounded terms and by the indicator of \Xi — t\ < h 2 ^ n B). The three terms 
from £3 involve, instead of 84, respectively 5 2 , S 2 and S 2 5 (see (|4T|) - (j43l) 'l. We have from (f36|. 
(|55]1 and (03) that 

sup <5^ = O a . s . (n~ 7 ' 9 logn) uniformly in feVc, 
teDf, ' \ ' 

that the same is true for #4 by (1431) . and, moreover, by (|40l) . 
then, if we define Qi (t) by 



B 2 ||/|U^ 2 (^) = Oa.s. (n" 7/9 logn) uniformly in / G 7> c . 



we have 



and therefore, 



1 ~ 

£3{t,hi tn ,h 2i n) = — r — yj<3»(*)i 

Wl2,n r— ' 

sup |Q»(*)| = Oa.s. (n _7/9 logn) uniformly in / € 7>c, 



sup |e 3 (i;/ii, n ,/i2,n)| = O a . s .(/i 2 n n 7/9 logn) uniformly in feV c , 



teD r 



proving the lemma for £3 as /i 2 « n 7 ^ 9 logn << n 4 / 9 . 
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Lemma 2 Let 



T(t; hi. n , /l2,n) 

n/ii 



(51) 



X -Y- 



rt ~ x n/ 



-^ E^ f.r 1/2 wMnP^) - ^(^r 1 -) WvV /a (*)) '(l* -x\< h 2 . nB 

' ' 2—1 ' ' 



where L(z) — K(z) + zK'(z). Then, 



-4/9n 



sup |ei(t, hi, n ,h 2 , n ) -T(t;h 1 , n ,h 2 , n )\ = o a . s .(n 7 ) uniformly in feV C - 
teD r 

Proof. Given a function _ff of two variables, and two i.i.d. random variables X and Y such that 
H(X, Y) is integrable, we recall that the second order Hoeffding projection of H(X, Y) is 

ir 2 (H)(X, Y) = H(X, Y) - E X H(X, Y) - E Y H(X, Y) + EH. 

We also recall the [/-statistic notation 



U n (H) 



n(n 



~~Y) z2 H ( X ii X j)> 



l<i^j<n 



where the variables Xf are i.i.d. Set 

't-X ! 



H t (X,Y):=L 



*2,n 



/ 1/2 (^)) r 1/2 (x) K (j^f) n\t x\ 



-X\<h 2 , n B). 



Then, 



n hl t nh 2 ,n 

n(n — 1) 



1 " 

ei(t,h hn ,h 2 . n ) = - -^(HtiX^X^ - E Y H t (X u Y)) + U n (H t - E Y H t (-,Y)) 

n(n — 1 z — ' 



i(n- 1) ^ 

v ' »=i 



(52) 



(decomposition of a ^-statistic into the diagonal term and a [/-statistic). Now, notice that 

U n (H t -E Y H t (Xi,Y)) = U n {n 2 (H t (;-)) + (E x H t {X,-)-EH)) 

= U n (n 2 (H t (- , ■)) + h 1:n h 2 , n T(t;hi, n ,h 2 ,n) (53) 

So, we now must handle the diagonal term, a completely centered or canonical [/-process and 
(in the next lemma) the empirical process T. 
Diagonal term. Note that if we define Qi such that 

1 n _. n 

——y j {H t {X u X i )-E Y H t {X i ,Y)):=^—-—y j Q i {t), 

' ' 2 — 1 



~i 2 hi n h 2n 



then we have 

sup \EQi(t)\ < h 2 , n , sup EQ\{t) < h 2 , n , sup |0i(*)| < 1, 

teD r teD r teD r 

where as usual we overlook multiplicative constants that do not depend on /, and the last bound 
does not depend on n. So, 



1 
sup -jr — 

t£D r n ft l,n"-2,n 



£&(*) 



i=l 



< 



1 

~2T. — I — sup 

n tii !n n 2i n t£D r 



£)(&(*) - EQ x {t)) 



1 



nh\ 
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The supremum part correspond to the empirical process over the class of functions of x 



Q n = \l (L-ZfW( x ?) r l/ \x) (k{Q) - EK (j^Pj ) /(I* - x\ < h 2 , n B) :teD r 

(54) 
These classes are VC type with the same characteristics A and v that do not depend on /, and 
with the same envelope, that depends only on K and r (see the Appendix). Then, Talagrand's 
inequality gives, as in previous instances, that, for some 5 > 0, 



y sup Pif < sup 



£](&(*) - EQx{t)) 



i=l 



1 ,__ / n (8+25)/9 



>n (4+5)/9 <c 2 ^exp(-C : 



< 00, 



which, since n 2 hi^ n h2, n n 4 / 9 > n( 4 + 5 )/ 9 and since nh\ tU » n 4 / 9 , yields 
1 



sup -~t — 

t£D r n 'll,n"2,n 



J2(Ht(Xi,Xi) - E Y H t {X h Y)) 



o a .s.(n 4/9 ) uniformly in feV C - (55) 



The canonical U -statistic term. We will use Major's exponential bound (1611) for canonical U- 
processes over VC type classes of functions. In our case, since the class of functions {H t : t £ D r } 
is uniformly bounded and of VC type (see the Appendix) we can apply Major's exponential bound 
to sup tS £, r \U n (^2(Ht))\. Since, as is easy to check, 

EH*{X,Y) < 2B\\L\\l \\f\\ 00 \\K\\Zhi, n h2, n , 



we can take, for C such that ||/||oo < C, c 2 — Ch\, n h>i, n and t — C 1 ^ 2 n 1+S ■Jh\^Ji 2 ~n for a small 
5 > 0, to have, from (|6ip . 



-Cm 6 ) < oo. 



V sup Pr/ \ sup |J7„(7r 2 (ift))| > Cn 5 1 ^Jh^ ri h 2 ,n \ < C 2 Vexp i 

n /:||/IU<C lteD r J V 



Since 



\Jh ltn h2, n 



« n ' 9 (we can take 5 so that this is true), we obtain 



SU P I 1 — I 

t£D T 1l,n'l2,n 



U n (7T 2 (H t ))\ = o a . s .(n- 4 / 9 ) uniformly in / G P c . 



(56) 



The following lemma will conclude the analysis of ([35 
Lemma 3 With T as defined in Lemma® we have 



Sup \T(t] hl, n , h2,n)\ = O a . s . 

t£D r 



logn 



4/9 N 



uniformly in / G Vc- 



Proof. Note that 



1 " 

T(t; /n, n) h a , n ) = £>(«, X t ) - £<?(*, V)) 



nh 1 , n h 2 



i=l 



where 



ffft,*) = £x [r 1 / 2 (A)^(^— ^)L(i-l/ 1 /2 (X ))/(|i - X\ < h 2 , n B) 



(57) 
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By (jni|) in the Appendix, the class of functions {<?(£, •) : t £ D r } is of VC type for the envelope 
II^IMI^Ih^i „ and the characteristics A = R and v = 22, and the lemma will follow by appli- 
cation of Talagrand's inequality. We just need to estimate Eg 2 (t,X). We have, making several 
natural changes of variables, 



Eg 2 (t,X 1 ) = E 



hi 



f l/2 {x)K (^lIl) L ^-' 



hl t n ' \ h 2 _ n 

y_25± \t ( 1 _}l 1 1/2 



fl' 1 {x))l{\t-x\ <h 2 , n B)dx 



f y\y)K^-^)L{^^\v))l{\t-y\<h 2 , l B)d V } 
f 1/2 (t hi, n Vi)L(^vif^(t /ii^i))/ 1/2 (t - hi, n v 2 ) 



'hi 



L (^f 1/2 ^- h ^Hi^-^ K 



t — u 

h\n 



v 2 



'hi 



'hi 



l(-^\vi\ < b)i(^\v 2 \ < B)f( U )dudvidv 2 



h\. 



■hi 



fl\t - hltnVl )L^vif^ 2 (t - hi >n vi))f l ' 2 {t - hi, n v 2 ) 



'hi 



x L[j^v 2 f 1 ' 2 {t - hi, n v 2 j)K{v)K{v + vi - v 2 ) 



'h\ n 



I[-^-\vi\ < B)i(— — \v 2 \ < B)f(t- hi, n vi - hi, n v)dvdvidv 2 



< h" 



hl,n 



(^vi.f' 2 {t hunVl) ) L (^v 2 f^ 2 (t hi, n v 2 ) 



< B)dvdvidv 2 



hi 



n\\J Moo 



— h\ n^2,n 



x K(v)K(v + vi- v 2 )l(^\vi\ < b)i(-^\v 2 \ 
J J J L {j^(™ + V2)f 1/2 (t hi, n w - hi, n v 2 ))L(^-v 2 f l ? 2 (t 
K(v)K(v + w)l(^\w + v 2 \ <b)i(^\v 2 \ <B)dvdwdv 2 

\h 2 . n ' I V/l 2 ,n ' 

L i i JT lw + 2)/V2(t " hl ' nW ~ h2 > nZ) ) 



hl <n V 2 , 



hi. 



(zj 1/2 (t - h 2 . n z)\ K{v)K{v + w)l(\-^-w + z\< B\l(\z\ < B)dvdwdz 



< 2^ in / l2lB ||/||2 fl(||A'|| 00 + B||/||Va||A-'|| 00 ) 2 . 

So we can take a 2 = c 2 (lV\\f\\% D )h 2 n h 2 . n , where c 2 depends only on K. Since, as indicated above, 
the collection of functions g(t, •), t € D r , is VC for an envelope of the order /ii, n , Talagrand's 
inequality (|60[) implies that there exist finite positive constants cq,Ci such that, with Ci as in 
®, if 



(58) 



CiiiyWffllV^hi.X'L 



log 



c< 3 h l 



1/2 



\ ° 1 

\ c 2 hi, n h 2 



/2 



< u < C 2 



fl/2 
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then 



Pr, 



J2(9(t,Xi)-Eg(t,X)) 



> u > < Ci exp — ■ 



C> 2 



KlVll/PJn^ „h 2 , n 



The condition on u can be written as 

C[(l V ||/||^ 2 )n 2 / 9 (logn) 5 / 9 < u < C' 2 (l V ll/llD^logn) 1 / 9 , 
and if we take u = M(l V ||/||oo )n 2 ' 9 (logn) 5 ' 9 for some large enough M, then 

C 3 v? 



Y, exp 



uniformly in /. Hence, 



X! sup Pr / 
fe-Pc 



cl{lV WfWUnhlh^, 



Y^(9(t,Xi)-Eg(t,X)) 



E< 



-M^C 3 (logn)/c: 



2 < oo 



!=1 



>M(lVC 3 / 2 )n 2 / 9 (logn) 5 / 9 



< 



EE- 



-M 2 C 3 (logn)/c 



2 < OO 



This shows that T(t; /ii, n /i2,n) is asymptotically a.s. of the order of n 2 / 9 (logn) 5 / 9 /(n/ji in ft.2,n) = 
[(logn)/n] 4 / 9 uniformly in / G "Pp. ■ 

From PT?]) and Lemmas [TJ [5] and [3J we obtain: 
Proposition 3 Under the Assumptions^ for any C < oo we have: 



sup 

tG-D r 



f{t] hl y n, ft-2 : n) — fit] ft-2 : n) _ ^(t; ftl ;n , /l2,ra) 

and m particular, 



logn 



4/9^ 



uniformly in / G "Po 



sup 
teD r 



/(t;/»i 



/(*; ^ 



Oa, 



logn 



4/9N 



uniformly in / G Vc- 



Remark 2 a) We should remark that if we undersmooth the preliminary estimator a little more, 
by taking h\^ n = n~( 2 +'')/ 9 with < n < 2, then the three lemmas above are true and moreover 
we have sup tG£)e |e,-(t, hi >n ,h2, n )\ = Oa.s.( n_4 ) m Lemma [TJ So, for such hi in the order of 
the first term in Proposition [3] is actually o as . (n -4 ' 9 ). This is at odds with condition (9) in 
Hall, Hu and Marron (1995), as their condition does not necessarily imply undersmoothing of 
the preliminary estimator, b) It is worth mentioning that Proposition [3] does require that the 
indicators I(\t — Xi\ < hi^ n B) be part of the definition of @ and (J3|): in fact none of the three 
lemmas in its proof seem to go through without it. This condition is required as well for the bias 
of the ideal estimator, but it is not necessary for its variance part. 

Now we can complete the proof of the main theorems [TJ and [2] Only the stronger Theorem [2] 
requires proof: 



IS 



Proof of Theorem [2j Proposition [3] and Theorem [3] together give (fTTj) . The limit (|T2"j) can be 
easily derived from (fTTTl . as follows. By (|38p and (|59")l . the preliminary estimator satisfies 



sup \f(fr, hi, n ) - f{t)\ = O a .s. I 7/1s ) uniformity in 2? c . 2 (59) 

for all C < oo, z and r. Now, for all n large enough, on the event 

n 7/18 - 
sup -==\\f{p, fti, n ) - /(*)||oo < A] 

n>fc Viog^ 

we have D™ C -D r f° r all n > fc, and therefore, 

Pr/ ( sup ( » ) 4/9 ||/( i; ft^, ka, n ,u) - f(t)\\nn >^\ 

< Pry (sup (-^) 4/9 ||/(i; fo ft 2 w ) - }(t)\\ Dr > A 2 

Ln>fc logn 

f n 7 / 18 - 

+ Pry J Sup — ^||/(i; fo ) - /(i)^ > A] 

Now, there exist Ai and A2 such that the limit of the sup over T>c, z of the first probabilities is 
zero by (TTTI) , and the limit of the sup of the second ones over the same set is also zero by (151)1) , 
proving (fT2j) . ■ 



4 Appendix: Some Vapnik-Cervonenkis classes of func- 
tions and their exponential bounds 

Let J 7 be a collection of uniformly bounded measurable functions on (S,S). We say that F is 
of VC type with respect to an envelope F if there exist constants A, v positive such that for all 
probability measures Q on S, 

N(T,L 2 (Q),e)<( AmL ^ )\ 0<e<l, 



where F > |/| for all / € F and N(F,L2(Q),e) denotes the smallest number of L2(Q)-balls 
of radius at most s required to cover F . (See e.g., de la Peha and Gine (1999).) It turns 
out that empirical processes or [/-processes indexed by these classes of functions are very well 
behaved, particularly if F is uniformly bounded and if the class F is countable. For instance, we 
have the following version of an exponential inequality of Talagrand (1996) from Einmahl and 
Mason (2000) and Gine and Guillou (2001, 2002). Let P be a probability measure on S and let 
Xi : S M> S be the coordinate functions of S , which are i.i.d. (P), and set Pr = P N . If the 
class F is VC type, bounded and countable, then there exist < d < 00, 1 < i < 3, depending 
on v and A such that, for all t satisfying 



C lv WlogM^<i< 



\F\\ 
we have 



Pr < max 

Kk<n 



J2(f(Xi)-pf) 



>t}<C 2 exp[-C 3 -t-), ((»()) 



T 
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where 

||^||co><T 2 >||Var P (/)||^. 

(Talagrand (1996) states his inequality only for the sum over n, but the same works for the 

maximum of the partial sums up to n by a (sub)martingale argument that can be carried out 

because these inequalities are obtained by integrating bounds on the moment generating function 

-see e.g., Ledoux (2001).) Major (2006) also has a similar inequality for classes of functions of 

several variables. We will state his inequality for bounded VC type classes of functions of two 

variables only. Let T be such a class of functions and let H-FH 2 *, > c 2 > ||Var(/(Xi,X 2 ))||jr. Let 

7if (/)(*, y) = f(x, y) - Ef{X, y) - Ef(x, X) + Ef(X, Y). Then, if J" is a uniformly bounded, 

countable class of VC type, there exist < Ci < oo, 1 < i < 3, depending on v and A such that, 

for all t satisfying 

211-Flloo " 2 ^ 3 

dna log " "°° <t< 



we have 



Pr 



/ /_^ "2 f(Xi,Xj) 

l<i^j<n 



\F\\ 2 

\- L II OO 



,!)■_ C,r, lt (-C 3 ^-). ((51) 



T 

Major states the theorem for {tt^ f} of VC type, but it is easy to see that if F is VC type for F 
then {it 2 f : / € J 7 } is VC type for the envelope AF. 

It is also worth mentioning that (much easier to prove) moment bounds for the above quan- 
tities are also available (e.g. in Gine and Mason (2007) and references therein) and that they 
can be used instead of Talagrand and Major's inequalities if one is only interested in the 'in 
probability' version of Theorems Q] and [5J 

We now show that the classes of functions appearing in the previous sections are of VC type, 
and the suprema countable. We will do this in all detail for the class J- in (fT5|) . and will give 
indications for the rest of the classes of functions used. 

First we observe that the sup inside the probability bound in (|14[1 is actually a supremum 
over the set {t £ Q, h £ Q P\ [h 2 k, h 2 k-i)} by the continuity properties of K and the indicator of 
\t — Xi\ < hB. This observation applies to all the other classes of functions in the previous two 
sections. 

Lemma 4 Let J- be as in \15)) with K and f satisfying Assumptions^ Then, there exists a 
universal constant R such that for every Borel probability measure Q on R, 

Cl/2\ ^ 
mMlh^] (62) 

1/2 

where ||i^||y is the total variation norm of K , that is, T is of VC type with envelope \\K\\v\\f\\oo 
with A — R independent of K and f and v = 22. 

Proof. By adding an arbitrarily small strictly increasing function to the positive and negative 
variation functions of if, we have K = K\ — Ki with Ki strictly increasing, positive and bounded, 
with 1 1 iVi 1 1 oo (Halloo) arbitrarily close to the positive (negative) variation of K. Let K.\ be the 
class of functions obtained from T by replacing K by K\ and deleting the indicator in each of 
the functions in the class. Then, if we assume f(x) > for all x, the subgraphs of the functions 
in the class fC\ have the form 

(x,u) : K X (*^£/V2 (a .)^ f i/* {x) > u | = j M . ^f^ix) > K^{u/f^(x))} 
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( „) : ^M-^M- Kl -'(«//"»)>o}, 



and so they are the positivity sets of functions from the linear space of functions of the two 
variables u and x spanned by f x / 2 {x), xf 1 / 2 {x) and K^ 1 (u/f 1 ^ 2 (x)). Hence, by a result of 
Dudley (e.g. Proposition 5.1.12 in de la Pena and Gine (1999)) the subgraphs of K\ are VC of 
index 4. If the set {x : f(x) = 0} is not empty, the same argument above shows that the class 
of subsets of S = {x : f(x) >0}xI,{(i,n)eS: K x (^/ 1/2 (z)) f 1/2 (x) > u} is VC of index 
4, and therefore so is the class of subgraphs of /C|i, which is obtained from this one by taking 
the union of each of these sets with the set {x : f{x) = 0} x {u < 0} (which is disjoint with 
all of them). Therefore, in either case, by the Dudley- Pollard entropy theorem for VC-subgraph 
classes (e.g., loc. cit. Theorem 5.1.5), we have 

N{K u L 2 (P),e)<( * mi f n " 2 \ , 0< £ <||^ 1 |U||/||V2 
where A is a universal constant, hence, 

Ci /o \ 8 
A|WI+ £ " /l|o ° J , 0<s<\\K\\ + \\f\\H 2 (63 ) 

where ||.RT||+ is the positive variation seminorm of K . The analogous bound holds for K, 2 , defined 
with K 2 replacing K in J- '. Since, as is well known, the set J of all indicator functions of intervals 
in R is VC of order 3, we also have 

N{J,L 2 (P),e)< (-} , 0<£<1. (64) 



for another universal constant A. Now, any H g T can be written as H = k\g — k 2 g for fcj G K, h 
and g € J ., so that, for any probability measure Q we have 

Q(H-H) 2 = Q{{k 1 -k2)g-{ki-h)gf 

< 4Q(fei - hf + 4Q(k 2 - hf + 2||Jsr||^||/|| 00 Q( 5 - g) 2 . 

Given e > let 81 = e/4 and S 2 = e/(2|| J ft:||y||/|| 1 / 2 ). Then, if the collections of func- 
tions k[ , . . . , k N and k\ ,..., k N l are L 2 (Q) (5i-dense respectively in the classes K\, K, 2 , and 
31, . . . ,gN 3 are Li(Q) 5 2 -dense in the class J, with optimal cardinalities Ni = N(Ki,L 2 (Q), 81), 
i = 1, 2, and iV 3 = iV^, L 2 (Q), 6 2 ), then, by the previous inequality, the functions (fcj J — fci- J )g; 



are L 2 (Q) e-dense in T . Since there are at most N%N 2 Ns such functions (this estimate may not 
be optimal), the inequality (|6"2")l follows. 

■ 

A similar result holds for the classes Q n defined by ([50)1 in the proof of Lemma [I] the classes 
of functions Q n defined by ((54)) and the classes {H t (x,y) : t £ D r } in the proof of Lemma [2j as 
all these classes have the same structure as J- in Lemma 2] 

The class of functions Q := {g(t, ■) : t € D r } where g is defined in (1571) in the proof of Lemma 
[31 requires some extra considerations. Let Q be any probability measure on the line and let 
s, t € D r . Then, using Holder, we have 

E Q (g{t 1 x)~g{s 1 x)) 2 < J E x [f(X)- l K 2 (j^ff) x 
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xE x ^(L-^/V2 ( x))/(|t -X\< h 2 , n B) - L(^-^f l ' 2 {X))l{\ s -X\< h 2 , n B)\ dQ(x) 

= hi, n \\K\\lJ (L(^-fV\yj)l{\t y\ < h 2 , n B) L (^-jf^(y))l(\s - y\ < h 2 , n B)\ f(y)dy 

= hx, n \\K\\lEf{l t -l 8 f (65) 

where t a and it are functions from the class 

C := \l ( *Z1/V2(.) j i(\t -■\<hB):t€R,h>0 
which is VC with a constant envelope by Lemma SJ This lemma then proves that for all Q, 

( R\\L\\ v \\K\\ 2 h\^\ 22 1/2 

N(g,L 2 (Q),s)< f WU £ ln \ , 0< £ <||L|| y ||^||X/„ 2 , (66) 

1 ii 
in particular, Q is VC for the constant envelope ||L||y||.Kl|2'h n > w ith characteristics A — R and 

w = 22. 
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